PCA Projection and Missing Data

Todo

  • make better section headings
  • add better descriptions
  • profit

Load dataset and dependencies

Here we’re loading a genotype matrix with 159 individuals and ~30k SNPs. We selected three individuals from each Human Origins population.

# install dependencies
packages <- c("tidyverse", "cowplot", "softImpute", "missMethods", "norm", "mvtnorm", "ggrepel", "plotly", "magrittr")
Map(function(x) { install.packages(x) }, packages[!packages %in% utils::installed.packages()])
## named list()
library(magrittr)
## Warning: package 'magrittr' was built under R version 3.6.2
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
library(plotly)
## Warning: package 'plotly' was built under R version 3.6.2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, fig.width = 8, fig.height = 6)


setwd('~/Documents/exp_dat_reading_group_2021/session_4/')
source('helper_functions.R')

# load data
geno_matrix <- scan("geno_matrix_three.txt", what = "character") %>%
  strsplit("") %>%
  do.call(rbind, .) %>%
  apply(., 2, as.numeric)

context_info <- readr::read_csv("context_info_three.csv")
## Rows: 159 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Individual_ID, Group_Name, Country, Makro_Region
## dbl (2): Longitude, Latitude
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
###### 

Global PCAs

Here we plot two PCAs - the first uses all 159 individuals. For the second plot, we remove 9 individuals (plotted with red dots and labels), and then project those individuals onto a PCA generated from the remaining 150 individuals. You can see that there is some shrinkage towards the origin.

## projecting:   1  2  3  4  5  6  7  8  9 done.

Downsample data from projected individuals

First compute the static “background” PCA.

Projections with three levels of downsampling

## projecting:   1  2  3  4  5  6 done.

## projecting:   1  2  3  4  5  6 done.

## projecting:   1  2  3  4  5  6 done.

Now try many different downsample values, to see the effects of extremely large amounts of missing data.

## projecting:   1  2  3  4  5  6 done.
## Downsampling:
##   0.5projecting:   1  2  3  4  5  6 done.
##   0.6projecting:   1  2  3  4  5  6 done.
##   0.7projecting:   1  2  3  4  5  6 done.
##   0.8projecting:   1  2  3  4  5  6 done.
##   0.9projecting:   1  2  3  4  5  6 done.
##   0.91projecting:   1  2  3  4  5  6 done.
##   0.913projecting:   1  2  3  4  5  6 done.
##   0.916projecting:   1  2  3  4  5  6 done.
##   0.919projecting:   1  2  3  4  5  6 done.
##   0.922projecting:   1  2  3  4  5  6 done.
##   0.925projecting:   1  2  3  4  5  6 done.
##   0.928projecting:   1  2  3  4  5  6 done.
##   0.931projecting:   1  2  3  4  5  6 done.
##   0.934projecting:   1  2  3  4  5  6 done.
##   0.937projecting:   1  2  3  4  5  6 done.
##   0.94projecting:   1  2  3  4  5  6 done.
##   0.943projecting:   1  2  3  4  5  6 done.
##   0.946projecting:   1  2  3  4  5  6 done.
##   0.949projecting:   1  2  3  4  5  6 done.
##   0.952projecting:   1  2  3  4  5  6 done.
##   0.955projecting:   1  2  3  4  5  6 done.
##   0.958projecting:   1  2  3  4  5  6 done.
##   0.961projecting:   1  2  3  4  5  6 done.
##   0.964projecting:   1  2  3  4  5  6 done.
##   0.967projecting:   1  2  3  4  5  6 done.
##   0.97projecting:   1  2  3  4  5  6 done.
##   0.973projecting:   1  2  3  4  5  6 done.
##   0.976projecting:   1  2  3  4  5  6 done.
##   0.979projecting:   1  2  3  4  5  6 done.
##   0.982projecting:   1  2  3  4  5  6 done.
##   0.985projecting:   1  2  3  4  5  6 done.
##   0.988projecting:   1  2  3  4  5  6 done.
##   0.991projecting:   1  2  3  4  5  6 done.
##   0.994projecting:   1  2  3  4  5  6 done.
##   0.997projecting:   1  2  3  4  5  6 done.
## Done.

Plot this with an animation, where you can scan through the amount of data removed.

Try to get better boundaries

Here we do many iterations of the same amount of downsampling, to better observe the range of effects.

## projecting:   1  2  3  4  5  6  7  8  9 done.
## Downsampling:
##   0.5projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.6projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.7projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.8projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.9projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.95projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.96projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.97projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.98projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.99projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## projecting:   1  2  3  4  5  6  7  8  9 done.
## Done.

non-African PCA

It’s possible that our dataset is not very susceptible to downsampling, due to the large amounts of variation present in the populations. Here we subset to just the non-African samples, and repeat some of the above experiments.

Basic non-African PCA (with and without projection)

## [1]   108 31813
## [1] 108   6

## projecting:   1  2  3  4  5  6  7  8  9 done.

Try many levels of downsampling

## projecting:   1  2  3  4  5  6  7  8  9 done.
## Downsampling:
##   0.5projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.6projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.7projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.8projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.9projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.91projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.913projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.916projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.919projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.922projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.925projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.928projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.931projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.934projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.937projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.94projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.943projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.946projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.949projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.952projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.955projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.958projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.961projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.964projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.967projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.97projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.973projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.976projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.979projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.982projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.985projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.988projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.991projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.994projecting:   1  2  3  4  5  6  7  8  9 done.
##   0.997projecting:   1  2  3  4  5  6  7  8  9 done.
## Done.

Debug PCA shrinkage

Ignore this

## [1] 0.9949212
## [1] -0.942749

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.